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1 Introduction 



The quest to understand mechanisms behind the temporal dynamics of a natural population (animal or plant) always 
yields useful information for ecological biodiversity management. The present work was motivated by the analysis 
of time-fluctuations of an ecological time series, the annual larch cone production. Because of the impracticabil- 
ity of quantitative evaluation of such population size, only semi-quantitative data are often available numerically. 
Data are coded with finite ordered categories or levels. From this some natural questions arise. Among them, the 
following one is of crucial interest here: do lagged values determine future production? The same basic problems 
occur as for classical quantitative time series but here the greatest difficulty stems from the nature of the studied 
process (as previously mentioned, working on categorical variables induces many difficulties since most of the 
notions used for quantitative variables have no more sense in such context). 

Statistical models have been useful instruments for testing hypothesis concerning the mechanisms behind temporal 
evolution and to cha racterize temporal patterns. Two m odels are used throughout this paper t o achieve this goal: 
a regre ssion model ( iFokianos and Kedeml l2002l 12003 ^ and a parametrized Markovian model dJacobs and Lewis , 



1978ah . The first one is a regression model for categorical time series which is based on generaUzed linear regres- 



sion theory (IMcCuUagh and Nelden. 119891) . Such model extend linear models to accommodate both non-normal 
response distributions (which is the case in the study of categorical data) and transformations to Unearity. So, ap- 
plying a generalized linear models consists in two choices: a family of probability distribution and a link function 
between the re sponse and the predictors. Fo r categorical data some widely used models are: multinomial-logit 
( Agresti . 1990 ) and cumulative odds models ( McCuUagh . 19801) . Adaptation of such models to categorical times 
series is easy to do putting past ob servations at different lags as categorical predictors of the response at time t 
(IFokianos and Kedeml l2002l l2003b . The second one is indeed an adaption of the discrete auto-regressive (DAR) 



model introduced by Jacobs and Lewis (Il978ah to the case of categorical time series. As noticed by McKenzie 
(120031) . DAR models « would be more suited to modelling dependent sequences of categorical observations, but 
this does not seem to have been attempted yet ». To the best of our knowledge, no advance in this direction is made 
since the paper of McKenzie. 

These two models have some advantages and some disadvantages which are not necessary the same, implying a 
complementarity between these two approaches. Among the common advantages, the main one is that they are 
easy to be interpreted by the practitioners. Since most of the time series in ecology are short-length (for a statistical 
purpose), we have to consider only models involving a reasonable number of parameters. That is the reason why 
we will focus on one order lagged model (even these models can be extended easily to large order lag values). 
Among the inconvenient of the DA R model, the main one is the stationarity of the time series, which can not be 
checked by any statistical tests (see (IMcGee and Harrisll2005h for a discussion about several notions of stationarity 
for categorical time series). However it allows us to derive a simple model for taking into account missing values (a 
contrario to the regression model, the DAR can not tr eat directly the case of missing values). Our approach differs 
highly of the one recently proposed by Bandt (l2005h . Indeed he considers a continuous-state, but non-Gaussian, 
time series and its analysis relies only on the ordinal property of IR. Moreover his methodology requires a long 
time series, which is not realistic in many real cases. 

Motivation of the present work is the analysis of time series of larch cone production data in spatially disjoint 
locations in order to determine some temporal patterns of larch cone production dynamic at different locations 
and to discuss some kind of spatial synchrony. Data are detailed in the first section. Next section 3 is devoted to 
present two regression models: one for categorical time series and o ne for ordinal time series. These two regres- 
sion models have been studied by Fokianos and Kedem ( l2002tl2003b . In section 4 we adapt the one order discrete 
auto-regressive model to the context of categorical data (the ordinal characteristic is not taken into account in this 
model). In particular we develop independence tests and estimators of the various parameters of the model. In 
section 5, we apply these models to two real data sets: the first one deals with annual larch cone production (over 
31 years) and the second one with weekly planktonic abundance (during one year). Last section is devoted to 
conclusion and discussion. 
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2 Motivations 



The masting is the intermittent synchronous production of seed crops by a plant population dKellv and Sorkll2002h . 
It often shows an evolved strategy related to others environmental masting patterns such as rainfall, temperatures, 
. . . Thus variability in seed production according to past values is a good descriptor of environmental changes in 
climate for example. The information arising from the characterization of tempor al patterns on such time series 



could be used to infer role of environmental parameters and other mechanisms jPrice et al. 



2006). The data 



accounting cone production were registered for 3 1 years on four valleys located in the Southern French Alps (in 
the same area of the Alps called "Brian9onnais"). Here we will consider four different sites selected to be at 
comparable altitudes (ranging from 1800m to 2200m): Ayes (altitude: 2200 meters), Montgenevre (altitude: 2200 
meters), Nevache (altitude: 1800 meters) and Prorel (altitude: 1800 meters). Cone production at a given site was 
roughly estimated at the beginning of the cone development by counting cones along one meter of branch for at 
least one hundred rando mly se lected trees. The intensity of larch cone production at any site was then classified 
into six classes (Roques, 1988|) from no cones (coded 0) to very heavy crop i.e. more than two hundred cones per 
tree (coded 4). Annual cone production is considered to be the realization of an ordinal time series with values 
{0, 0.5, 1, 2, 3, 4} corresponding to a scale classification endowed with a natural ordering. Data are plotted on 
figure [U 

When studying the dynamic of the larch population on each sampling sites, a first step could be to identify temporal 
patterns of cone production and then to compa re each patterns from o ne site to others to conclude or not at a spatial 
synchrony on a "short" regional spatial scale (iLiebhold et a/.L 120041) . However, the observed series in figure[T]do 
not exhibit obviously the presence of such patterns. The salient features of the series are: no seasonality, high 
location to location variability with respect of duration and beginning of intensive larch cone production, presence 
of missing values, . . . However visual remarks should be considered carefully. 

Such time series could appear as too short-length for the statistician who generally needs a lot of information to 
infer on a phenomenon but the data are collected from 1975 to 2005, which corresponds to an entire career of a 
biologist! 



3 Regression models for categorical and ordinal time series 



The model used here is a generaliza tion of classical re gression models to the case of t i me-depende nt categori- 
cal observations and was studied by / Kauffmami 1987 ) (see also ( Fokianos and Kedem . 2002 , 
summary of the main theoretical aspects). 



20031) for a good 



3.1 Introduction to generalized linear models for qualitative time series 

Assume that the observed series is a particular reaUzation of the stochastic process in discrete time {Yt] which 
will be described below. Values of Yt are supposed to belong to a finite set £^ = {1, . . . , A:} of fc ordered or not 
categories. Because we are interested in temporal dependence between successive observations, we condition on 
the observed past. For any positive integer I, let us denote by Tt-i the cr-field generated by Yj-i, lt-2, ■ • ■ , Yt-i- 
Let Yt = {Yt,i, ■ ■ ■ , yt.k-i)' where Ytj equals to 1 if the j-th is observed at time t and otherwise. The analysis 
of time series based on a generalized linear model suppose that the response variable is influenced by its past 
values which are viewed as predictors influencing the distribution of Yt by way of a transformation of a linear 
combination. 

B[Yt\Tt-i]^h{Y't_iP), 

where I is the order of the lag time and Y{._, is the covariate matrix containing the lagged values of the response 
variable until lag I. In the following we will focus on I S {0, 1, 2} (since we aim at treating short-length time 
series). The vector /3 is a vector of time-invariant parameters to be estimate which will reflect the intensity of the 
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dependency between the response and its past. Because the response variable is a categorical time series, we have 
the following relation: 

for every j € {l,...,k} and every t{l,...,n}, where TTt,j,i is a transition probability. Let TTt,-,i = 
(TTt i . . . jTTf fc_i i). Some adequate regression models for categorical data falls in the family of generalized 
linear models which links vector of transition probabilities of the response vector to the covariate process through 
the equation: 

7rt^,r.= 7:t,.AP)^HX-iP) or (/?))= Y;_;/3 . (1) 

In other words, the study of having the response Yt = j at time t is equivalent to carry out a regression on covariates 
which are the lagged values of the categorical response process. This model is also called a Markov regression 
model for categorical time series. The function h is called the inverse link function and is related to a link function 
that describes how the mean depends on the linear predictors. For each response distribution there exists a variety 
of link functions to connect the mean with the linear predictor. The use of a generalized linear model is the choice 
of a combination of response distribution and a link function. 



3.2 On the choice of the Unk function 



The link function should adapted to the type of data jFokianos and Kedem , 



2003): 



• Nomin al data: the m ost commonly used model for categorical (or nominal) data is the multinomial logit 

cxp(/3jj/t_;) 



model (lAgresti , 



199C): 



1 + E,;i exp(/3;yt-0 



for any j € { 1 , . 



1}. This equation also defines log-odds ratios relative to TTtm by: 



log 



= /3'yt- 



(2) 



(3) 



Ordinal data: since data are ordinal, its is more convenient to model the cumulative probability function of 
Yt. For ordered categorical time series a re asonable choice of link function is the logistic distribution one 
which leads to the proportional odds model dMcCullaghi 1 1 9801) : 

1 



h-\x) 



1 + cxp(— a;) 



It follows that the link function is: 



log 



P[Yt<j\Tt-i] 
P[Yt>j\J^t^i] 



(4) 



(5) 



3.3 Parameters estimation and global adequacy criteria 

Since the joint distribution of response and covariates is often not easy to establish, the likelihood methods are not 
applicable to estimate the vector of regression coefficients /3. As we are interested in the estimation of the effects of 
the cova riates on the response, we can use the inference th eory based on partial likelihood function. The reader can 



refer to dFokianos and Kedeml 12002 ; 



Viennet et al. 



199a) for more details and application. The partial likelihood 



method leads to non linear equations system. Multinomial models were fitted using the function mult inom from 
library section nnet on R. Proportional-odds logistic regression models were fitted using the function polr from 
library section MASS on R. The vector of parameters of this model is estimated using an iterative weighted least 
squares IWLS (IChakners and HastieLll992tlVenables and RipleyLl2002h . 

The analysis of the global adequacy and goodness of fit of such models to the data is discussed using the Akaike's 
information criterion (AIC) which also allows to compare several models. The values of this criterion depends on 
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the number of model parameters and penalizes models with large number of parameters. Such consideration is 
important in the study of short time series where the number of parameters can be rapidly equal to the length of 
the time series. The chosen model is the one which minimizes the value of AIC among the others. 

In this preliminary work, no detailed analysis of the residuals of the models is done. Such analysis is important to 
assess the goodness of fit between the chosen model and the observed data but was not the priority of this paper. 



4 Discrete auto-regressive model and categorical data 



The discrete auto-regressive (DAR) model introduced by Jacobs and Lewis (Il978at Il978bt Il978cl) is used here 
to model categorical data. Some independence tests are developed, using either the Markov property or runs 
properties. Estimators of the parameters are studied in the precise context of categorical data. Simulated data are 
used to illustrate numerically the quality of these estimators. 



4.1 Introduction and model 



In a series of papers, Jacobs and Lewis (Il978at Il978bt Il978cl) introduced and studied time series models for 



discrete variables. Among them we will focus here on the discrete auto-regressive of order 1, denoted by DAR(l). 
Such process {Xt} is a discrete-time stochastic process with values on a finite ordered set E = {1, . . . ,k} and is 
defined as follows: 

V<> , Xt = VtXt-i + (1 - Vt)Zt , 

where {Vt} is a sequence of iid Bernoulli random variables with parameter a E [0; 1] and {Zt} is a sequence 
of iid random variables having the distribution tt on E, the two sequences being independent. Moreover we will 
assume that Xq is distributed according to the distribution tt, implying that the process {Xt} is stationary. The 
case of a = 1 is not interesting since Xt — Xq, with probability 1, for any t. The case of a = means that the 
process {Xt} is simply a sequence of iid random variables having distribution tt. Hence the parameter a could be 
interpreted as follows: the nearest to a is, the more « independent » the sequence {Xt} is. Indeed, for all /i e IN, 
p{h) — a'' is the auto-correlation function of a DAR(l) process. Hence DAR(l) models can be used to describe 
a situation of short range dependency with high correlation. It is easy to prove that stochastic process {Xt} as 
defined above is a Markov chains on E with transition matrix P given by the following equation: 

P = aI+{l^a)Q , 

where *(5 = [*7r| • • • |*7r]. Such Markov chain admits obviously a unique stationary probability distribution which 
is TT. One can easily deduce the h-th power of Q and P: for all /i ^ 1, Q'' = Q and P'' = a'^I + (1 — a'^)Q, 
illustrating one more times the role of a. 

This stochastic process could be generalized to higher order leading to the DAR(p) model. In fact, these models 
appear themselves to be a special case of mixture transition distribution (MTD) model introduced by Raftery 



(Il985h . Thus DAR(p) can be viewed as an alternative to MTD model. According to Raftery, a MTD model fits 
better data in general than a DAR(p) one, especially for p ^ 3. However here we will prefer to use a DAR(l) 
model since it has the following advantages over the MTD model: 1) the two parameters a and tt play different 
roles: a is related to the correlation whereas tt is the stationary distribution; 2) these models involve generally a 
reasonable number of parameters (more parsimonious) especially when few data are available; 3) parameters could 
be easily interpreted by a practitioner. But the special case of DAR(l) model presents the disadvantage of being 
restrictive over the transition matrix. 

Here we are interested on the use of such stochastic processes for modeling categorical variables (here the k 
different modalities are encoded by using the k first positive integers). It implies that many characteristics of 
these processes have no sense in such context, as it i s the case for the auto-correlation function (see above). 



Thus estimators developed by Jacobs and Lewis (119831 see pages 28-30) cannot be used. Hence we address the 
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statistical problem of estimating the parameters in a DAR(l) model in presence of categorical data. Assume we 
observe Xo, . . . , Xn for a fixed value n > 0. First we will test whether {Xt} is a sequence of independent random 
variables (a — 0) or not. In a second step we will estimate all the parameters of the model: a and tt. Then we 
propose a very simple model in order to consider the case of missing observations. 

4.2 Independence tests 

In this section we aim at testing whether {Xt} is a sequence of independent random variables (a — 0) or not. Two 
ways will be investigated. The first one will use the Markov property of the DAR(l) model and the second one 
will be based on runs property. Anyway, all along this section, the null hypothesis Hq will be « a = » and the 
alternative hypothesis will be « a 7^ ». 



y te st based on the Markov property The following test is a classical test for Markov chain (see (IReinert et al. 



2000l) for an illustration in DNA analysis context). We only use the fact that {Xt} is a Markov chain, but not the 
particular structure of its transition matrix. Classical results on Markov chain inference leads to the following 
estimate for the transition matrix P: 



P. 



j-j' 



where N^^^, = and iVj;. = E/ee - ELi In other words N^^ is the 

number of jumps from state j to state j' and iV". is the number of visits of state j, in the sequence of observations 

Xq, . . . , Xn- 



The null hypothesis can rephrased as follows: Pjjr — Pj .P. ji, for any £ E^. Under iJp, the maximum- 



likelihood estimate of Pj^jt is: 



p. ., — P p -, — 
^ i.'i' — 1.- ^ - .1' 



3,- -J 



ln-1 ' 

where Af.^^E.eiJ^L'^ nil 1 {x,=j'}- Hence one has to consider the following statistics C : 

Theorem 4.1 Under the null hypothesis, — - — > x?i._i-,2 ■ 

Some well-known practical restrictions exist in order to be able to apply this test. As example, one can require 
that Pj^f > 5%, for any (j, j') e E^. 

Tests based on runs property Unfortunately we cannot compute the power of the previous test, that is the reason 
we will now consider a second family of tests. These tests will be based on runs property of the model. Runs in 
sequence of iid Bernoulli distributions are studied for a very long time: this problem seems to be considered for 
the first time by Abraham de Moivre in 1756 (problem LXXIV in hi s book The Doctrine of Chances). For an 



historical perspective, see the introduction of the Part I of (lMoodlll940t) . Most of the existing papers deal with the 



case of Bernoulli random variables, but here we are indeed intere sted in the general discrete case. Few extensions 



were made in this direction. To the best of our knowledge. Mood (119401) is the first one who studied it. 



A run can be defined as follows: it is a consecutive sub-sequence of identical values in a sequence of random 
numbers. For any j e E, let us denote by „ the number of non-overlapping j-runs of length i in the sequence 

Xl, . . . , Xn'. 

R].n = ; x,n-i 7^ j ,, a:„i = j , x„,+i^i = j , , x„,+i ^ j}\ ■ 

Let us now define the number Rj n of j-runs and the total number i?„ of non-overlapping runs: 

n 

-^J'" ~ ^ ^),n ' and Rn = ^ Rj,n ■ 
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Mood d 19401) obtained the limiting distribution of i?„ after renormalization. Two cases have be distinguished: 

A: = 2 and fc > 2. 



Theorem 4.2 (corollary 5 p. 390 and corollary 3 p. 392 in AMood\l94(i} } 

Rn - 2n7ri7r2 d 



1. Ifk = 2, 



2^?i7ri7r2(l - 37ri7r2) 
2. Ifk > 2, -^(i?„-n(l-E,e^^'^' 



AA(0,1). 



MiO,al), with al = Y.,eE^] 



2E 



One can check that in both cases the asymptotic variance is degenerated if and only if there exists j E E such 
that TTj = 1 (and then, for all / ^ j, ttji = 0), then the variance is degenerated. These convergence results could 
be used to construct an asymptotic test. 

An alternative solution could be to consider the longest run in the sequence Xi, . . . , X„. Indeed there exists many 
works dealing with the case of either independent trials or Markovian trials. Let us denote by L„ the longest length 
of all runs in the sequence Xi, . . . , Xn. Using the previous notations, we have; 

Ln — max{i ; 3j s.t. > 0} . 



Vaggelatou (l2003b studied this random variable in the case of multi-state Markovian trials. It requires that {Xt} 
is an irreducible and aperiodic Markov chain on a finite state space E with transition probability matrix P and 
unique stationary measure tt. The Markov chain induced by a DAR(l) model is irreducible and aperiodic if tt > 
(meaning that all the components of tt are strictly positive) and a ^ 1 (see chapter XV of (lFellen,ll968h ). Let us 
define the two following quantities: 



P - 



max P-j. 
jeE ■'■ 



and 



If p < 1, then Vaggelatou proved the following asymptotic result (theorem 1 in (IVaggelatoui 12003^ ): 



P(L„ - [logi/^n] <x)= exp {-n(l - p)7rpp[i°Svp + o{l) , 



(6) 



where [•] denotes the integer part and o(l) means that the residual te rm is « small » in regard with n. This 
result extends the classical one obtained many years ago by Goncarov (119621) in the case of iid Bernoulli trials. 
In both case, Ln — [logi / „??■] d oes not have a limit distribution, but only certain sub-sequence; for instance, 
theorem 2 in dVaggelatoij, l2003b gives a case where the sub-sequence converges in distribution to the Gumbel 
distribution. We will use theorem 1 (and not theorem 2) to construct a third (and last) test since we have not 
enough observations in real situation. Let us denote by po and pi the value of p respectively under the null and the 
alternative hypothesis. Under the null hypothesis, P will be equal to the matrix Q as defined in the introduction: 
^(ii j') G E"^, Pjji = TTji (the transition probability from state j to state / does not depend on j). So we have 
that po = max{7rj ; j G E}: po < 1 if ™d only if, for any j G E, ttj < 1. Under the alternative hypothesis. 



pi — maxjPjj ; j G E} = maxja + (1 



j £ E}: pi < 1 if and only if a < 1 (it is initially assumed) 



and for any j G E, iVj < 1. Thus we find the same condition in both cases and this assumption is the same as for 
the previous test. From now we will assume that tt > in addi tion to the previous assumptions (let us recall that 
we already assume that a ^ 1). Using theorem 1 of Vaggelatou (l2003h . one could obtain an asymptotic confidence 
interval with a prescribed confidence level e E (0; 1): 



Ho 



logon - 



ln(e/2) 



; logon - 



ln(l - e/2) 



= 1 



L 



n(l - po)7Tpa ) '' ''° \ n{l ~ Pq)t:po 

1 (notice that it is corresponding sometimes to the definition of runs: see for instance (iJacobs and Lewis , 
1978a|)). It follows that the power fl^ of this test is: 

ln(£/2) 



n. = 1 



Hi 



''P" V " (1 - po)^po 

To compute 11^, one has to use equation (|6]l above. 



Hi 



Ln < l0gp„ ( - 



ln(l - e/2) 
n{l - po)npg 



1 



4.3 Parameters estimations 



The two parameters tt and a of such DAR(l) model could be estimated separately since by construction they play 
different role. 

Estimations of tt For any j E E and for any i G {1, . . . , n}, let ~ lil^Xi=j}' these random variables have 
the Bernoulli distribution with parameter iTj . A natural unbiased estimator of iTj is therefore: 



1 " 

1=1 

Moreover, using the expression of P'^ given in the introduction, one can easily derive the variance of 7fj : 



1 2 
Var[%] = -7rj(l - tTj) + —(1 - TTj)TTjVn(a) 



and the covariance between ttj and tTj/ (with j' ^ j): 



^ ^ 2 

Cov[7fj,7fjv] = -TTj>TTjVn{a) 



with y,i(Q!) ~ X]h=i("^ ~ Applying formula (0.113) in (IGradshteyn and RyznikL Il965h (arithmetico 

geometric progression), we obtain the following expression for Vn (a): 



rt — a" a(l — a" ^) 

Vn{a) = n 

1 — a 1 — a r 



This leads to the following Umit for the variances and the covariances: 

lim nVar[7f,] = — — — 7rj(l ~ tt,-) , 

and: 

^ - 1 2a 
lim riCov[7r,-, 7r,-/J = tTj-'TT,- . 

n^oo 1 — a 

As a consequence of Bienayme-Chebychev inequality, the asymptotic result on the variance implies that ifj is 
consistent: 

Proposition 4.1 For any a G [0, 1), tTo > tt,-. 

n — >oo 

Moreover o ne can prove the following central limit theorem for ttj by application of the ergodic theorem for 



Markov chain (iJonesL l2004t) and Slutsky theorem 



^ TT; — TT, d . ( 1 + a 

Theorem 4.3 For any a £ [0, 1), V^. , ^ = M 0, 

When a = 0, the asymptotic variance equals to 1 as it is well-known for Bernoulli trials. The largest a is, the 
larger the asymptotic variance is. Since the asymptotic depends on a which is generally unknown, we cannot yet 
use the last proposition to construct confidence interval. In order to do it, we will need an consistent estimator of 
a. 

Estimations of a We will first consider the maximum Ukelihood estimator of a, assuming that tt is known. Since 
{Xt} is a Markov chain, the log-likelihood is: 

/:(Xi,...,X„;a) = %/log^w'(«') , 
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where Njji is defined as in section 2 and where Pjj' (a) is the transition probability from state j to state j' (which 
only depends on a). Replacing the expression of transition probabilities, we obtain that: 

£(Xi, . . . , X„; a) = ^ iV,-, log(a + (1 - a)n,) + ^ N,,^, log((l - a)nj,) 
J6S \ ]'eE\{j} 

It follows that the maximum likelihood estimator al of a is the solution of the following equation: 



n ^ — ' 



3-J 



n ^ a + (1 — alTT,- 

When TT is unknown, one can use the estimation given above and so the plug-in estimator Oi of a is the solution 
of the following equation: 



-E 



n: 



3-J 



n ^ a + (1 — aVKj 



1 



0L2 = 



Unfortunately we cannot derive an explicit expression of oi . An alternate possible way is to minimize the following 
function: 

U.r}eE^ 

Solving this optimization problem leads to an explicit expression a2*'. 

E (1 " ) '-^JJ " '^i ) " E E ^^rr - T^j' ) 
* _ jeB jeEj'eE\{j} 

(fc-i)E-l + E(i--.)' 

jeE jeE 

It seems to be difficult to establish properties of this intuitive estimator. Hence it is not recommended to use it 
as an estimator of a. However it could provide a possible initialization for an optimization procedure to obtain a 
numerical value of al- The corresponding plug-in estimator oi of a is given by the following expression: 

jeE jeEj'eE\{j} 

(fc"i)E^I + E(i-^.)' 

jeE ]eE 

Simulated data The estimators developed above are applied on simulated data in order to evaluate numerically 
their performance. We simulate data with various values of a, vr and n: 

. ^=(i,i),^=(i,|),^=(i,i,i)and7r = (i,i,i) 

• a e {0.1; 0.2; 0.5; 0.8; 0.9}. 

• n e {50; 100; 500}. 

Hence two cases are studied: k = 2 and fc = 3. For each values of these parameters, we simulate m = 100 
independent DAR(l) Markov chain and then we compute the estimators. Unfortunately these we have no guarantee 
that the two estimators of a belong to the unit interval. Hence we precise for each cases the number of samples on 
which the computations were done (as it is reasonable, the number of samples is increasing with the number n of 
observations). Results are given in tables [T] to H] (only the estimators are computed for simulated data). 

4.4 A variant with missing observations 

Sometimes categorical time series may contain some missing values/observations. Here we now propose a very 
simple adaptation of the DAR model in order to taking into account the missing values. Since the DAR model is 
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stationary, it will be easy to derive similar expressions as in the initial model. 

Assume that at each unit of time, the probability of a missing value equals to /3 (which does not depend on the 
time). Hence, if we denote by Zt the values at time t, we have : 

Zt = 

where —1 is the value corresponding to a missing value and {Xt} is DAR(l) stochastic process as described 
previously. Hence /3 is the probability that a value is not observed : this probability is assumed to be not depending 
on i. Since (Xt) is a stationary stochastic process, it follows that (Zt) is still a Markov chain, but taking values 
on the set £■ ~ { — 1} U £'. Its transition probabilities matrix P can be expressed in function of the transition 
probabilities matrix P of (Xi): 

1-/3 /3*7r 
(l-/3)lfc /3P 

where 1^ is the unit vector of R*^. There is now three parameters to be estimated. Indeed tt and a (with the 
maximum likelihood method) can be estimated as previously. The extra parameter /3 can be simply estimated as 
follows: 

5 Applications to ecological data 

Finally tests and estimators are applied in this section to real data. We apply the two models described previously 
to two real data sets. The first one deals with larch cone production (see section 2) and the second one to planktonic 
abundance. 



P = 



5.1 Larch cone production 

The total number of observations was n = 31, with k — ^ categories. The goal of this work is to study masting on 
such trees. For a full description of the data, see section 2. 

First we apply regression models. Tables|5]and |6]contain values of AIC respectively for the categorical time series 
regression model and the ordinal one. We also indicate the number of parameters to estimate and the number of 
observations (these time series may contain missing values). For the first case, the independence assumption leads 
to the better for all sites. For the second case, model with a lag order 1 and 2 fits better for the site Ayes 2200 and 
Montgenevre 2200 while the model with only a lag of order 1 fits better for Nevache 1800 and the independent 
model fits better for Prorel 1800. Comparing values of AIC, the model for ordinal time series seems to be more 
accurate for these data sets. 

Second we apply the DAR model. Table [T] contains the estimations of the three parameters for each of the four 
data sets. In all cases, the independence hypothesis is rejected with the two first tests, while the third one leads to 
accept the assumption of independence (all with a first type error at 5%). However the power of this last test is 
more or less weak in all cases (ranging from above 26% to 51%). Thus it is reasonable to reject the assumption of 
independent observations. 

In all cases the categorical time series regression model leads to accept the temporal independence between obser- 
vations. It may be due to the fact that the parameter a in the DAR model is closed to zero. However this parameter 
can be assumed to be significantly different of zero, according to the performed tests. It is in concordance with the 
fact that observations are time-dependent when using the ordinal time series regression model. Time series studied 
here are very short-length and it may the cause that the conclusions based on the ordinal time series regression 
model and the ones based on the DAR models. Since few data are available, one should prefer to use the DAR 
models (because it involves less parameters than the other models). 
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5.2 Planktonic abundance 



We now consider weekly planktonic (Thalia democratica) abundance data. Data were kindly given by F. Menard. 
In such context, the objective is to test and to compare the temporal patterns from one year to another. Hence 
we apply the two models described above for four years (1987 to 1990). It follows that each data-set is made of 
n — h2 observations. Abundances were determined semi-quantitatively according to classes defined on scale of 5 
values, E — {1, 2, 3, 4, 5}. The observed series are shown in figure|2]and exhibit the same problems as the larch 
cone production ones. Notice that t he fifth category wer e not obse rved for any year Fo r a complete description of 
the data, the reader could reefer to (IMenard et adll993h (see also (IViennet et a/.lll998h V 

First we apply regression models. Tables [8] and |9] contain the number of parameters to be estimated, the values of 
AlC and the number of observations, respectively for the categorical time series regression model and the ordinal 
one. With the model for categorical time series, the model with one order lag fits better all the four years. With 
the model for categorical time series, model with two order lag fits better for the year 1987 and 1989 while model 
with only a one order lag fits better for the two other years, 1988 and 1990. 

Second we apply the DAR model. Table [TOl contains the estimations of the three parameters for each of the four 
data sets. The two first tests leads to reject the null hypothesis, i.e. to reject the independence of the observations. 
The third test leads also to reject the independence assumption for the two last year (1989 and 1990) while the null 
hypothesis is accepted according to this last test for the years 1987 and 1988 (respectively with a power equal to 
44% and 66%). Thus it is also reasonabke to reject the assumption of independent observations. 

We can also analyze these data sets as one unique time series (notice that it was not possible for the previous 
data-set). Hence the sample size is now n — 208. All the tests lead to reject the assumption of independent 
observations (with a power equal to 69% for the last one). The last line of table [TOl contains the estimations of the 
three parameters for the whole period. 

Here the situation is totally different than previously. Indeed the categorical time series regression model and the 
DAR model lead to the same conclusion, i.e. a one order lag dependence in the time series. One can notice that for 
these data sets the parameter a (of the DAR model) is now between 0.308 and 0.540. However the ordinal time 
series regression model leads in some case to a two order lag dependence. Since the number of observations is 
almost the twice than for the first data sets, one should rather prefer to use the ordinal time series regression model. 



6 Conclusions and discussions 

Applications to real data achieve to convince that these two complementary models are relevant for practical pur- 
pose. Based on the result obtained over real data, one can conclude that either the ordinal time-series regression 
model or the DAR model should be used to treat such data. Indeed in all cases the categorical time-series regres- 
sion model seems not to present advantages over the two other models. The choice between the ordinal time-series 
regression model and the DAR model depends highly on the context, i.e. ess entially on the number of observations 



and the number of parameters to be estimated. Fokianos and Kedem (120031) claimed that « the regression method- 
ology can discover dependencies in the DNA sequence data which cannot assessed by a Markov model ». However 
data treated here can serve as a counter-example of this sentence. For the two data sets studied here, conclusions 
based on the regression models and the one based on the Markov model are almost identical. 

From this work, one can conclude that in any case the DAR model has only few parameters to be estimated, but 
with an equal to number of unknown parameters (as in the example of larch cone production) one has to prefer the 
ordinal time-series regression model. 

However both suffers of relying on assumptions or simplifications. Hence these models could be extended in the 
following ways: 
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• Stationarity: the DAR model is strongly stationary (in the sense defined by McGee and Harris in (l2005b ). 
This assumption should be checked with any statistical tests. However no test of stationarity of a categorical 
time series has been developed to the best of our knowledge. Anyway when dealing with short-length 
time series stationary assumption is not really restricti ve. O therwise a solution could be in applying the 
de-trend algorithm suggested by McGee and Harris in (I2005I) . However their algorithm is more and more 
computationally complex as the number of states is increasing (in fact they mainly consider the binary case). 
We do not focus here on the study of the possible stationarity of a categorical time series which will be done 
in a future work. A major advantage of regression model is that it is not necessary to have stationarity. 

• Higher dependence order and number of parameters: since we consider the case of short-length data, we 
hmit our study to one or two order lagged models. Indeed both models could be applied to p-th order 
lagged models. However, for the regression model, a large value of p implies a large number of parameters 
to estimate, that may induce some numerical instability (due to correlation between the regressors). The 
number of parameters in a DAR(p) model is lower, but if p > 1 we have no more the Markov property. 

• Environmental factors: these models do not include environmental covariates. T he regression model could 
easily integrate such situations, as shown by Fokianos and Kedem (120021 : 



2003 ). For the DAR model, it is 



not so easy. A solution could be to consider inhomogeneous Markov chain or a state-space model. 
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Figure 1 : Annual larch production in four sites in the Southern Alps 
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Table 1: Results obtained with tt = (i, ^) 
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(a) Year 1987 



(b) Year 1988 
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(c) Year 1989 (d) Year 1990 

Figure 2: Weekly planktonic abundance for four years 
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Table 2: Results obtained with tt = (i, |) 
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Table 3: Results obtained with tt = ^, ^) 
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Table 4: Results obtained with tt = (3,^,3) 
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Table 5: Categorical time-series regression models appUed to annual larch cones production 
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Table 6: Ordinal time-series regression models applied to annual larch cones production 
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Table 7: DAR models applied to annual larch cones production 
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Table 8: Categorical time-series regression models applied to weekly planktonic abundance 
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Table 9: Ordinal time-series regression models appUed to weekly planktonic abundance 
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Table 10: DAR models appUed to weekly planktonic abundance 
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